查看原文
其他

答读者问 (二)为什么我的PCA分析报错了?

BIOMAMBA Biomamba 生信基地 2023-06-15

往期回顾:

答读者问 (一)单基因究竟能否进行GSEA

如上图所示,昨天有一位读者咨询PCA分析报错的问题,首先呢还是建议大家发报错的时候尽量发全一些,有些错误需要考虑到上下文。另外,很多错误R语言已经对用户做出了相应提示(如下文中黄色的部分),经过完整代码的检查后我们找出了这位读者的错误并进行了修改,正确的代码如下:

Biomamba

2021/9/25


数据处理

rm(list = ls())#先清除所有变量
#####读入数据########
data=read.table("GSE.txt",header=T,sep="\t",row.names=1)
data=as.matrix(data)
data.class <- rownames(data)
data.pca <- prcomp(data, scale. = TRUE)#计算PCA
write.table(data.pca$rotation,file="PC.xls",quote=F,sep="\t")
write.table(predict(data.pca),file="PCnewTab.xls",quote=F,sep="\t")
看看数据

BNKCD4TCD8TMonoNeutro
GSM20411830.14736570.19850700.365379300.12259280.1661553
GSM20411840.14292920.24105230.358181700.00978510.2480517
GSM20411850.17769620.20402240.412504500.11071030.0950666
GSM20411860.12878030.21189400.362199000.24367400.0534528
GSM20411870.15865670.18941030.409867300.14292740.0991383
GSM20411880.10694730.23796400.343822400.16766190.1436043
GSM20411890.16442110.18080290.341632500.11021150.2029321
GSM20411900.13302600.18130680.381828200.18086980.1229692
GSM20411910.17374500.17098410.412816600.10082840.1416259
GSM20411920.23080620.20274440.357340600.14953460.0595742
######PCA可视化
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
######正常组和肿瘤组的样品数目
group=c(rep("Normal",79),rep("Tumor",104))
######预测主成分的值
pcaPredict=predict(data.pca)

PC = data.frame(PC1 = pcaPredict[,1], PC2 = pcaPredict[,2],group=group)
head(PC)
## PC1 PC2 group
## GSM2041183 0.75273835 0.67558607 Normal
## GSM2041184 1.84402079 1.97989076 Normal
## GSM2041185 0.92023279 -0.20677388 Normal
## GSM2041186 -0.02871153 -1.26670228 Normal
## GSM2041187 0.69069491 -0.29914324 Normal
## GSM2041188 0.73819726 0.02010291 Normal
######取均值
PCA.mean=aggregate(PC[,1:2],list(group=PC$group),mean)
######自定义函数绘制椭圆
veganCovEllipse<-function (cov, center = c(0, 0), scale = 1, npoints = 100) {
theta <- (0:npoints) * 2 * pi/npoints
Circle <- cbind(cos(theta), sin(theta))
t(center + scale * t(Circle %*% chol(cov)))
}
df_ell <- data.frame()
for(g in unique(PC$group)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(PC[PC$group==g,],
veganCovEllipse(cov.wt(cbind(PC1,PC2),
wt=rep(1/length(PC1),length(PC1)))$cov,
center=c(mean(PC1),mean(PC2))))),group=g))
}
head(df_ell)
## PC1 PC2 group
## 1 1.529171 0.1845581 Normal
## 2 1.526877 0.2469502 Normal
## 3 1.520002 0.3085994 Normal
## 4 1.508574 0.3692624 Normal
## 5 1.492638 0.4286998 Normal
## 6 1.472257 0.4866770 Normal
#这里原本是levels(PC$group),但是大家可以看一下效果:
levels(PC$group)
## NULL

出图!!!

ggplot(data = PC,aes(PC1, PC2)) + ##原来这里都是PCA,应换成PC
geom_point(aes(color = group)) +
geom_path(data=df_ell, aes(x=PC1, y=PC2,colour=group), size=1, linetype=2, inherit.aes = FALSE) +
annotate("text", x=PCA.mean$PC1, y=PCA.mean$PC2,label=PCA.mean$group) +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存